Goto

Collaborating Authors

 column vector


LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Neural Information Processing Systems

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.




LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Neural Information Processing Systems

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.



Imaging with super-resolution in changing random media

Christie, Alexander, Leibovich, Matan, Moscoso, Miguel, Novikov, Alexei, Papanicolaou, George, Tsogka, Chrysoula

arXiv.org Artificial Intelligence

High-resolution imaging from array data in unknown inhomogeneous ambient media requires estimating both the medium properties and the object characteristics. For diverse measurements collected from different sources in different, changing media, we introduce in this paper an algorithm that recovers the ambient media properties needed for high-resolution imaging as well as the source locations and strengths that constitute the imaging target. This algorithm extends and improves upon our previous work on imaging through random media using array data. Previously, we addressed imaging through a single unknown random medium, either weakly scattering [ 1 ] or strongly scattering [ 2 ].



comments. Reviewer # 1 wants to see an algorithm that works when b

Neural Information Processing Systems

We thank all the reviewers for their time and valuable comments. "Provide an algorithm to output a distribution that's close to the target, even if b has negative components." We will mention this in the paper. This is an interesting direction for future research. "What happens when we increase the number of layers?"



DHO$_2$: Accelerating Distributed Hybrid Order Optimization via Model Parallelism and ADMM

Gu, Shunxian, You, Chaoqun, Ren, Bangbang, Luo, Lailong, Xia, Junxu, Guo, Deke

arXiv.org Artificial Intelligence

Scaling deep neural network (DNN) training to more devices can reduce time-to-solution. However, it is impractical for users with limited computing resources. FOSI, as a hybrid order optimizer, converges faster than conventional optimizers by taking advantage of both gradient information and curvature information when updating the DNN model. Therefore, it provides a new chance for accelerating DNN training in the resource-constrained setting. In this paper, we explore its distributed design, namely DHO$_2$, including distributed calculation of curvature information and model update with partial curvature information to accelerate DNN training with a low memory burden. To further reduce the training time, we design a novel strategy to parallelize the calculation of curvature information and the model update on different devices. Experimentally, our distributed design can achieve an approximate linear reduction of memory burden on each device with the increase of the device number. Meanwhile, it achieves $1.4\times\sim2.1\times$ speedup in the total training time compared with other distributed designs based on conventional first- and second-order optimizers.